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ABSTRACT 


We have designed the Standard Map Machine (SMM) as an answer to 
the intensive computational requirements involved in the study of chaotic 
behavior in nonlinear systems. The high-speed and high-precision per¬ 
formance of this computer is due to its simple architecture specialized 
to the numerical computations required of nonlinear systems. In this re¬ 
port, we discuss the design and implementation of this special-purpose 
machine. 
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1 Introduction 


High-speed computation is playing an ever increasing role in the pursuit of scientific 
endeavors. However, large areas of scientific research remain as yet inaccessable due 
to insufficient computing resources. There is demand in a wide variety of applica¬ 
tions for faster and faster computation tools. 

A technique that has proven effective in tackling such computationally intensive 
problems is special-purpose computing. Special-purpose computing has been used 
in a number of different areas with much success. For instance, the Digital Orrery, 
designed to simulate the long-term behavior of the solar system, performed the 
longest integration of the orbits of the planets to produce evidence that the motion 
of Pluto is chaotic. We see the power of special-purpose computing in competitive 
chess, where the computer Deep Thought has defeated numerous high-ranking chess 
masters, as well as chess programs running on general-purpose supercomputers. 

The performance advantage of special-purpose machines is derived from a number of 
factors. The application-specific design allows ttiore highly optimized architectures 
of less complexity than their general-purpose counterparts. These simpler machines 
are more cost-effective and can be dedicated solely to a particular problem, rather 
than having to be shared, as is the case for large expensive supercomputers. 

In this light, we have designed the Standard Map Machine (SMM) as an answer to 
the intensive computational requirements involved in the study of chaotic behavior 
in a class of nonlinear systems. SMM is tailored to perform nonlinear mappings 
with high speed and high precision. The prototype implementation fits on a single 
board and performs at an average rate of about 2.5 MFlops for the class of problems 
that it was designed to solve. The novel computer design and implementation are 
presented in the following sections. 
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2 Initial Considerations 


The Standard Map Machine (SMM) was designed as a solution to the computa¬ 
tional problem of studying the long-term behavior of simple nonlinear mappings. A 
number of initial considerations were taken into account in designing this machine. 
These included size, precision, types of operations needed to be performed, and the 
availability of parts. It was desired that the machine be relatively small and simple. 
We wanted it to be a backend processor to a host computer, such as a conventional 
workstation, and buildable in a relatively short amount of time. It must be able to 
compute with 64-bit double precision accuracy in order to minimize roundoff errors 
that could potentially be fatal for repeated iterations of a mapping. Also, the ma¬ 
chine needed to be able to perform standard functions that were common in simple 
nonlinear mappings. For these reasons we chose a currently available set of Weitek 
floating point chips, consisting of an ALU chip and a multiplier chip, to serve as the 
main computational unit. 

To determine the necessary functionality of the machine, a good example to use 
would be the standard map [3], given by the following set of equations: 

x n+1 = (x n + y n +i) mod 2w (1) 

y n + i = (y n + ksin(x n )) mod 2 tt (2) 

It is evident from observing this pair of difference equations that addition, multi¬ 
plication, range normalization, and elementary trigonometric functions needed to 
be performed. Addition, multiplication, and range normalization could all be per¬ 
formed as straightforward standard operations using the floating point arithmetic 
units. Trigonometric functions are computed using Taylor series expansions. The 
fewest number of terms needed for the desired expansion was determined by Cheby- 
shev economization [8]. For instance, the sine function in the standard map can be 
calculated to 64-bit IEEE double-precision accuracy using a seven coefficient series 
expansion of the following form [7]: 


sin(x) = x(si + x 2 (s 3 + x 2 (s 5 + x 2 (s r + x 2 (s 9 + x 2 (s n + -Si 3 ^ 2 )))))) (3) 


We used Horner’s method of polynomial evaluation to rewrite the sine expression 
to minimize the number of multiplications. SMM needs to be able to efficiently 
compute these types of expressions. 
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3 Architectural Design 


Figure 1 shows the architectural configuration of SMM. The machine has two sec¬ 
tions: the computation unit and the microcontroller. Data path specialization and 
instruction pipelining result in maximum utilization of the multiplier/ALU floating 
point module. 


3.1 The Computation Unit 

The computation unit has four parts: the data memory, the register file, the multi¬ 
plier/ALU module, and the feed-through latch. These parts are interconnected by 
32-bit data paths. Data is moved between SMM and the host computer via an inter¬ 
face to the data-memory system. Values that are to be used for a computation are 
written to the fast dual-port register file and then fed into the multiplier/ALU unit. 
The multiplier/ALU component is made up of a Weitek 1264 64-bit IEEE floating 
point multiplier and a Weitek 1265 64-bit IEEE floating point ALU. Normally, the 
floating point chips are operated in pipeline mode to maximize throughput. In this 
mode, a new 64-bit multiplier operation can be initiated every four clock cycles with 
a latency of ten clock cycles, while a new 64-bit ALU operation can be started every 
two clock cycles with a latency of twelve clock cycles. (The minimum clock cycle 
time for the Weitek chips is 60 nanoseconds.) The results of floating point oper¬ 
ations are then sent to the data memory, register file, feed-through latch, or any 
combination of the three. Intermediate results are ususally stored in the register 
file. The data memory is used if there is no free space left in the register file or if 
the value is not needed immediately. If the result is to be used as an input to the 
multiplier/ALU module immediately after it has been calculated, it is fed directly 
to the feed-through latch. The result must be written to data memory if it is to be 
sent to the host computer. 

The feed-through latch is an additional data path feature developed to optimize 
the computation of polynomial approximations of special functions such as sine 
and cosine. From the series expansion in Equation 3, we see that such functions 
are computed serially for a small number of execution units. 1 Because of this, the 
computation of each successive iteration in mappings such as the standard map is 
a serial computation, since the calculation of both x and y values depends on the 
value of the sine function. The feed-through latch takes advantage of the serial 

1 If there are a large number of execution units, the polynomial can be rewritten so that the 
terms can be calculated in parallel, independent of each other. 
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nature of the calculation by allowing the results of computations from the floating 
point unit to be fed directly back into the inputs to the floating point unit without 
passing through the register file. Data values can remain in the data paths for 
repeated computations without ever having to be written back to the register file, 
thus reducing the latency of the computation. 

Besides numerical data calculations, the Weitek unit is also used to perform condi¬ 
tional tests, such as comparing the values of two numbers, and converting floating 
point numbers to integers for data memory write address calculations. Write ad¬ 
dresses are calculated such that the low eight bits of the resulting integer are sent to 
the memory address register and latched in as the high eight bits of a memory write 
address. This allows fast calculation of page addresses to which to write the results 
of data computations. Furthermore, the amount of data memory that can be used 
to write results is not limited by the microcode memory size, as would be the case if 
write addresses were not calculated but rather were fixed by the microcode instruc¬ 
tion. Data memory read addresses are fixed by the microcode instruction because 
much of the data memory, where the final results of computations are stored until 
being sent to the host, will not need to be read for the kinds of applications that 
SMM will run, and direct addressing provides better latency so that data can be 
read immediately upon execution of the read instruction. 


3.2 The Microcontroller 

The microcontroller consists of a microcode loader, the microcode memory, a variable- 
length pipeline register, and a condition code selector. Programs for the computer 
are assembled into microinstructions on the host machine and downloaded from the 
host through the microcode loader, which then stores the microinstructions into the 
micromemory. When the microcontroller is operating, the program stored in the mi¬ 
crocode memory is executed. The instruction located at the current microaddress is 
read from the memory and sent to the variable-length pipeline register. The instruc¬ 
tion is clocked in through the register, resulting in the proper control signals being 
sent to the computation unit as well as the microaddress of the next instruction be¬ 
ing sent to microcode memory. The length of the pipeline is different for each control 
signal so that each of the signals coming out of the microcontroller arrives at the ap¬ 
propriate module at the right time. In addition, the length of the entire instruction 
pipeline is varied by the microinstruction, depending upon whether the multiplier 
or ALU is being activated. This variation is necessary because the pipelined latency 
of the multiplier differs from that of the ALU. The condition code selector sends 
branching instructions to the microcode memory depending on the current state 
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of the machine and the result of conditionals computed by the computation unit. 
Thus, data dependent instructions are permitted. 


read* 1 
data 


execute-1 operand 


write-1 

data 


read-2 
data 


execute-2 operand 


write-2 

data 


read-3 
data 


execute-3 operand 


write-3 

data 


Figure 2: Model of Instruction Flow 


A 96-bit instruction word length was chosen. The function of each instruction bit is 
described in Appendix A. The long word instruction format allows parallel execution 
of the subsystems of the computation unit. Instruction flow is as shown in Figure 2. 
One instruction word, and thus one single-precision floating point operation, can be 
executed every clock cycle. This performance is possible because of the variable- 
length hardware pipeline register. As mentioned above, the microcode dynamically 
alters the length of the pipeline so that the control signals arrive to the appropriate 
subsystem modules at exactly the right time. Furthermore, the hardware pipelining 
easily facilitates multiple branching by guaranteeing that the write address for the 
result of the data operation will not disappear until the result has been written, 
despite the interleaving of instructions. 

Each instruction word can be described functionally as shown in Figure 3. The 
memory field is used to write a data value stored in data memory to the register 
file. The computation field is used to perform an operation using the multiplier/ALU 
module. The branching field is used for conditional branching. 
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4 Implementation 


Our implementation of SMM uses standard off-the-shelf parts, primarily Advanced 
Schottky TTL technology. Single-port SRAM is used for both the data and mi¬ 
crocode memory systems. There are 2 K words of microcode instruction memory, 8 
K words of data memory, and 64 words of register file locations, where an instruction 
word is 96 bits in length and a data word is 32 bits in length. Four-level pipeline 
registers that combinatorially select which pipeline level the resulting output comes 
from are used to implement the microcontroller pipeline. The system clock, write 
pulses, and latch pulses are all derived using a single delay line. Schematics are 
given in Appendix B. 


4.1 Timing 

Ti min g analysis of the machine is shown in Appendix B. Positive edge timing is 
used. Maximum and minimum propagation delays are taken into account, as well 
as the tolerances on each tap of the delay line. The minimum clock cycle time is 
62.5 nanoseconds as limited by the critical data path latency from the output of the 
Weitek unit through the feed-through latch back to the input of the Weitek unit. 
As a result, a 70 nanosecond delay line was chosen for the initial implementation. 
Data reads are done during the first section of the clock cycle and data writes are 
done on the later section of the cycle. 


4.2 Construction 

The computer was built on a 366 mm x 220 mm wire-wrap board with ground 
and voltage planes to reduce noise problems. A picture of the completed hardware 
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Figure 3: Functionality of an Instruction Word 
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is shown in Figure 4. .IfiF bypass capacitors were soldered directly between Vcc 
and GND on dual-in-line chip packages and 47 fiF electrolytic bypass capacitors 
were distributed all over the board. In addition, 4.7 fiF tantalum capacitors were 
connected between Vcc and GND near the register files and Weitek chips because 
of their extra sensitivity to noise. Clock signals were fed through multiple buffers 
to avoid fanout and ground bounce problems. Twisted pairs were used for some of 
the longer wires to reduce noise. Several of the wires required termination because 
of heavy undershoot resulting from transmission line bounce [6]. Some of the wires 
connected to CMOS inputs were also attached to TTL inputs to reduce undershoot 
by taking advantage of TTL input clamping diodes. The current implementation 
uses a 100 nanosecond delay line to produce the system clock, resulting in a 100 
nanosecond clock cycle time. Some minor adjustments of the wiring that have not 
yet been done at this time would reduce the transmission line noise that is present 
and allow a faster cycle time. 


4.3 Host Interface 

The Standard Map Machine was implemented to communicate with an HP 9000 
Series 300 computer system [5]. The HP 98630 Breadboard Interface was used to 
construct memory map hardware to communicate between the backend processor 
and the host machine. The asynchronous nature of the protocol requires that the 
processor be stopped or in a “waiting state” in order for data to be transmitted to 
and from the processor and the host. 

The SMM “waiting state” is a special state the machine can enter while the sys¬ 
tem clock is running. While in this state, SMM promises not to access the data 
memory so that the host machine can access the memory safely without worry of 
bus contention. It is possible to instruct SMM to enter and exit waiting states from 
software, thus allowing data to be sent to and from the host during the execution 
of a program. 

Another way that the host can read and write data to SMM is to send an interrupt 
signal to SMM, stopping the system clock. In this state, the machine is no longer 
running and data can be transmitted between SMM and the host. In addition, 
programs can be loaded into the microcode memory since the microcontroller has 
stopped running. Note that programs cannot be loaded if SMM is running (i.e. the 
system clock is on). To start SMM, a “start” signal is sent to SMM, which sends 
the starting instruction from the microcode loader to the instruction register and 
enables the SMM clock. 
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Figure 4: The Standard Map Machine 
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Software was written to allow the user to interface with SMM in Scheme. Commands 
that are available to the user are listed in the Appendix D. These commands give 
the user great flexibility in the operation of the machine as well as easy access to 
the data memory system where the results of computations are stored. 


5 Programming SMM 

This section describes the programming language we created for SMM and the 
assembler which converts that language into SMM microinstructions. 


5.1 The SMM Language 

We designed a register transfer language (RTL) for writing SMM programs. The 
specific syntax of this language is detailed in Appendix C. Every RTL instruction has 
three mandatory parts: an instruction opcode, a data source, and a data destination. 
Optionally, an instruction specifies a conditional or unconditional branch in a fourth 
clause. 

Although RTL is similar to actual SMM microinstructions, it is sufficiently ab¬ 
stracted from the SMM hardware that programs may be written quickly. For ex¬ 
ample, instruction macros allow programmers to specify double-precision operations 
in one RTL instruction, even though it takes at least two cycles for SMM to move 
double-precision numbers around. The expansion of double-precision macros is one 
of the many tasks handled by the assembler. 

The choice of the RTL model is well suited to SMM. All fields of an SMM instruction 
are issued simultaneously; hardware pipelining ensures that control signals arrive at 
each pipe stage during the appropriate cycle. This approach avoids the overhead 
of flushing the pipeline before branches and the complexity of implementing soft¬ 
ware pipelines in the assembler. Furthermore, the instruction pipeline allows the 
programmer to completely specify an instruction at the beginning of the operation, 
rather than having to manually follow data through SMM 2 . The instruction pipeline 
allows RTL programs to be written for SMM quickly and efficiently. 

2 Note that programmers still have to be aware of how long an operation takes so as to not 
reference a data value before it has been computed. 
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5.2 The Assembler 


Chasm , the CHaos ASseMbler, is our assembler for translating the SMM register 
transfer language into microcode. Chasm is a multi-pass assembler; each phase 
performs a piece of the assembly operation on the entire instruction stream until 
the desired instruction bits are obtained. Below we detail the five chasm assembly 
phases. 


5.2.1 Phase 1: Macro expansion 

Phase 1 of chasm is responsible for reading the source code list and expanding 
all instruction macros. Macros come in two flavors: math macros and instruction 
macros. Math operation macros allow the programmer to use simplified expressions 
like (F->I <F00>) (float-to-int convert <FOO>) instead of the fully specified oper¬ 
ation (F2I <F00> (NONE) ). Instruction macros expand one instruction into many, 
such as the (WAIT) loop macro and the DA (double-precision assign) opcode. The 
output of chasm Phase 1 is a stream of valid, one-cycle instructions. 

When Phase 1 encounters a double-precision instruction macro, it replaces the macro 
with two single-precision instructions. The first instruction uses the same source and 
destination as the original instruction, but modifies the math operation (if any) to 
tell the multiplier/ALU that this is the beginning of a double-precision instruction. 
The second instruction uses sources and destinations at addresses one greater than 
the original instruction. This allows easy access to 64-bit quantities by using double¬ 
precision instructions with the addresses of the low 32-bit words. For example, the 
instruction: 

(DA (R 40) (+ (R 10) (R 20))) 

;; Assign the sum of quantities in (R20,R21) and (R10.RH) to (R40.R41) 


expands into the two instructions: 

(A (R 40) (D1+ (R 10) (R 20))) 
(A (R 41) (D2+ (R 11) (R 21))) 


Here D1+ and D2+ perform the double-precision addition function. 
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5.2.2 Phase 2: Instruction Placement 


Phase 2 of the assembler is responsible for assigning to each macro-expanded in¬ 
struction a location in the SMM microcode memory. Generally, instructions are 
placed sequentially starting at microcode address 0. Difficulties arise, however, 
when placing conditional instructions. 

Conditional branches in SMM are implemented by substituting one of seven one-bit 
condition codes for the least significant bit (lsb) of the microcode address. Thus, 
when we place a conditional instruction, we must make certain that: 


1. The next instruction (i.e. “condition false, proceed to next sequential instruc¬ 
tion”) in the instruction stream must be located at an even address (lsb = 
0). 


2. The instruction immediately after the next sequential instruction in the stream 
(lsb = 1) must be a copy of the instruction the machine is supposed to jump 
to. 


Phase 2 also performs two other important tasks. First, it makes sure that instruc¬ 
tions are properly “linked” so that execution proceeds properly from one instruction 
to the next, regardless of where those instructions are located in memory. Second, 
this phase places a (JUMP 0) instruction at the end of the code block. Since a copy 
of the last instruction downloaded remains in the instruction latches, this forces 
execution to begin at the start of the program. 


5.2.3 Phase 3: Label Collection 

The RTL allows any instruction to have a label for referencing by other instructions. 
Labels are useful for designating the destinations of IF clauses and JUMP instructions. 
Chasm Phase 3 associates an instruction address with each instruction label present 
in the source code. These associations are used later when calculating the instruction 
addresses of (GOTO : label) clauses. 


5.2.4 Phase 4: Instruction Duplication 

In Phase 2, chasm copied instructions that are destinations of conditional jumps 
by marking the high-order instruction with a COPY token along with the label of 
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the instruction to be copied. Now that Phase 3 has collected label information and 
calculated instruction linkage information, these tokens may be replaced by actual 
instructions. Chasm Phase 4 copies instructions as necessary, making sure that 
copied instructions link execution to the same instruction as the original. By the 
end of Phase 4, we are ready to build the bit strings to be downloaded to SMM. 


5.2.5 Phase 5: Bit String Construction 

Phase 5 of chasm converts symbolic RTL into the 96-bit strings to be downloaded 
into SMM. For each instruction, Phase 5 initializes a NOP bit string 3 and then pro¬ 
ceeds to modify that string based on the contents of the instruction. The output of 
Phase 5, the *chasm-bits* vector, is a vector of bit strings which may be down¬ 
loaded (by the download-code procedure) directly into SMM for execution. 


6 Performance 

The inherent maximum performance of any machine using one Weitek 1264/65 chip 
set is approximately 4 MFlops for multiplier operations, clocking the machine at the 
mini mum clock cycle time of 60 nanoseconds. SMM approximately achieves this ab¬ 
solute maximum when clocked at its maximum rate of 16 MHz. Actual performance 
of this computer for the class of problems it was designed to solve is limited by the 
throughput of the multiplier due to the serial nature of the computation. The cur¬ 
rent hardware is running at a slower clock rate of 10 MHz, thus yielding about 2.5 
MFlops for the single board machine. Since the ratio of time spent in host-processor 
intervention versus that in numerical calculation is small, these performance levels 
are sustainable for long periods of time. 

This high performance is achieved by specializing the data paths for numerical com¬ 
putations and the serial nature of the problems that the computer was designed to 
solve. Furthermore, variable-length hardware pipelining maximizes the throughput 
of the machine by encouraging interleaved instructions. The major advantage in 
the SMM design is that it allows the user to express large portions of real problems 
as efficient sequences of microcode instructions. These instructions perform large 
amounts of computation without ever requiring the host to manage control and data 
flow on the board. 

3 That is, a bit string which has the effect of a No Operation instruction. 
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7 Conclusion 


Studying chaotic behavior in nonlinear systems through simulation requires numer¬ 
ous calculations. The Standard Map Machine is a special computer designed and 
implemented to perform these intensive calculations, such as the computation of 
the standard map for millions of iterations. The prototype implementation fits on 
a small wire-wrap board and provides 2.5 MFlops of double-precision computing 
power for the class of problems that it was designed to solve. Its high-speed and 
high-precision performance are due to the specialization of its architecture to the 
numerical computations required of nonlinear systems. 

This backend computer has numerous advantages over conventional floating point 
accelerators and math coprocessors. Almost all of the computations can be per¬ 
formed on SMM itself using its own fast microcontroller, rather than relying on the 
slower instruction control of the host machine. This reduces the communication 
costs between the host and the backend processor, a factor that heavily reduces the 
performance of other machines. Furthermore, unlike costly supercomputing power, 
SMM serves as a cost-effective instrument that can be completely dedicated for long 
periods of time to numerical simulations of nonlinear systems. 

As technology improves, we claim that machines of a similar nature can be designed 
and implemented as effective instruments for scientific computing. Current chip 
technology can already provide at least twice the performance of the floating point 
multiplier/ALU chip set used in SMM. The simple architecture of special-purpose 
computers allows numerical operations to be implemented more efficiently than is 
feasible on a general-purpose machine. Specialized numerical architectures provide 
the edge in high-speed and high-precision performance necessary for intensive com¬ 
putations. 
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A Control Bits of the SMM Instruction Word 


control bits 

functional description of bits 

possible values for control field 

0 - 9 

microaddress bits 1-10 

0 - 1023 

10 

wait 

0 for not waiting, 1 for waiting 

11 

regfile B-port write-enable 

0 for not enable, 1 for enable 

12 

regfile A-port latch-enable 

0 for not enable, 1 for enable 

13 

regfile B-port latch-enable 

0 for not enable, 1 for enable 

14 - 19 

regfile A-port read-address 

0 - 63 

20 

regfile A-port/latch output 

0 for regfile, 1 for latch 

21 - 26 

regfile B-port write-address 

0-63 

27 - 32 

regfile B-port read-address 

0 - 63 

33 

regfile B-port output 

0 for enable, 1 for not enable 

34 - 39 

1264 multipler function 

0 - 63, see Weitek spec. 

40 - 45 

1265 ALU function 

0 - 63, see Weitek spec. 

46 

1264 load enable 

0 for enable, 1 for not enable 

47 

1265 load enable 

0 for enable, 1 for not enable 

48 - 52 

1264 load control 

0-31, see Weitek spec. 

53 - 57 

1265 load control 

0 - 31, see Weitek spec. 

58 - 60 

1265 unload control 

0 - 7, see Weitek spec. 

61 

multiplier/ALU select 

0 for ALU, 1 for multipler 

62 

regfile A-port write-enable 

0 for not enable, 1 for enable 

63 

feed-through latch pulse 

0 for do not latch, 1 for latch 

64 

SRAM write-enable 

0 for not enable, 1 for enable 

65 

SMAR address latch pulse 

0 for not enable, 1 for enable 

66 - 71 

regfile A-port write-address 

0 - 63 

72 

Weitek-to-SRAM buffer 

0 for enable, 1 for not enable 

73 - 75 

conditon code select A, B, C 

select ccO - cc7 

76 - 78 

1264 unload control 

0-7, see Weitek spec. 

79 - 90 

SRAM address 

0 - 4095 

91 

default condition code 

0 or 1, set by assembler 

92 

condition code latch pulse 

0 for not enable, 1 for enable 

93 - 95 

unused 



Table 1: Description of SMM Instruction Bits 














































































B Design Schematics and Timing Diagrams 


The design schematics and timing diagrams for SMM and the host interface are 
shown in the following figures. Figure 5 gives a functional description of the overall 
system with a detailed view of the system clock design showing the clock generator 
as well as the derived write pulses and latch pulses. Note that the clock buffers 
that were included in the actual construction to reduce fanout are not shown in 
the diagram. Figure 6 gives a detailed diagram of the wire connections in the 
computation unit. Figure 7 is a detailed drawing of the microcontroller. There are 
three bits of the 96-bit instruction word that are currently unused. Figure 8 shows 
the hardware for the host interface as well as the timing of interface signals. The 
signals from the host are transmitted from the breadboard interface card that is 
plugged into the backplane of the host, through ribbon cables, to the control and 
data buffers. The breadboard interface is not shown here. Figure 9 shows the timing 
analysis of the system. 
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Figure 5: Functional Description with System Clock 
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C The Register Transfer Language 


This section describes the instruction set of the SMM register transfer language. We 
first present a symbolic description of the language, and then describe the individual 
components of an instruction. 


Grammar for the SMM register transfer language 


<program> 

<instruction> 

<delabeled-instr> 

<opcode-operands> 


<dest> 

<source> 

< reference-1> 
<reference-2> 

< reference > 
<reg-ref> 
<regnum> 
<sram-ref> 
<sramaddr> 

< math-clause > 

< math-op > 
<next-address> 
<absolute-addr> 
<label> 
<label-char> 
<label-num-char> 
<load-mode- value > 
<if-clause> 


< condition-code > 


G 


G 


G 

G 

G 


(cinstruction > *) 

(<label> <delabeled-instr>) | (<delabeled-instr>) 
<opcode-operands> | <opcode-operands> <if-clause> 

A <dest> < source > | L <dest> < source > | 

DA <dest> < source > | DL <dest> < source > | 

NOP | SIM <opcode-operands> <opcode-operands> | 

SET-WAIT-BIT | CLEAR-WAIT-BIT | 

ALU-LOAD-MODE <load-mode-value> | 

MUL-LOAD-MODE <load-mode-value> | 

JUMP <next-address> | WAIT 

< reference^ 1> 

<reference-l> | <math-clause> 

<reference> | (LATCH) | (NONE) 

<reference> | (NONE) 

<reg-ref> | <sram-ref> 

(R <regnum>) | (REG <regnum>) | (REGISTER <regnum>) 
{x|0 < x < 63,x G N} 

(S <sramaddr>) | (SRAM <sramaddr>) 

{x|0 < x < 4091, x G N} 

(<math-op> <reference-l> <reference-2>) 

+ | * | -* | - | CMP | CMPO | I | F2I | I2F | F2IS 
<label> | < absolute-addr> 

{x|0 < x < 2047, x G N} 

:<label-char><label-num-char>* 

[a-zA-Z] 

[a-zA-ZO-9] 

{x|0 < x < 63,x G N} 

(LCC) | (LCC GOTO <next-address>) | 

(GOTO <next-address>) | 

(LCC IF <condition-code> <next-address>) | 

(IF <condition-code> <next-address>) 

< | > | ccO | ccl | cc2 | cc3 | cc4 | cc5 | cc6 | cc7 
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Most instructions are of the form: 


(<opcode> <dest> <source> <if-clause>) 


although not all instructions neceassarily contain all four elements. 

C.l Instruction Opcodes 

The first element of every instruction is the <opcode>, an identifier for the type of 
operation requested. Each opcode is described briefly below: 

C.1.1 NOP 

The NDP instruction does nothing (“No-OPeration”). It occupies one SMM cycle. 
WOP instructions can have labels and <if-clauses> like other instructions. 


C.l.2 ASSIGN 

The ASSIGN instruction, denoted by an “A” or “L” opcode (L for “load”) is the 
most common instruction in SMM RTL. It is used to move data from one place to 
another, possibly performing some function on it along the way. The <dest> of an 
ASSIGN is given the value of the <source>. If the <source> is a <math-clause>, 
the value of the math operation is calculated first, and that value is then assigned 
to the <dest>. Example: 

(A (S 42) (+ (R 5) (R 10))) 


Assign the sum of register 5 and register 10 to SRAM 42. 


C.l.3 DOUBLE ASSIGN 

The DOUBLE ASSIGN instructions “DA” and “DL” are macro instructions which 
permit easy manipulation of 64-bit double-precision numbers. A DOUBLE ASSIGN 
instruction is expanded at assembly time into two sequential ASSIGN instructions, 
with the source and destination locations “incremented” to reference the high-order 
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word for the second ASSIGN. All arguments to a DOUBLE ASSIGN should be low- 
order words of double-precision numbers (assumed to be even registers or SRAM 
locations). Example: 

(DA (S 8) (+ (R 4) (S 42))) 


expands into the two instructions 

(A (S 8) (D1+ (R 4) (S 42))) 

(A (S 9) (D2+ (R 5) (S 43))) 


The pseudo-operations “D1+” and “D2-b” represent the two phases of a 64-bit add 
operation (it takes two cycles to completely load both operands). 

Note: If a “DA” instruction has a label, the label will be associated with the first 
ASSIGN instruction. If a “DA” instruction has an (LCC) as part of it’s <if-clause>, 
that latch will occur on the first ASSIGN instruction. The remainder of the <if- 
clause> will take place on the second instruction. 


C.1.4 SIMULTANEOUS 

The SIMULTANEOUS (“SIM”) instruction allows the user to specify two orthogonal 
operations which should occur during the same cycle on SMM. For example, the user 
may wish to load a register with a value from SRAM while starting a computation 
in the ALU. Since these two operations do not share any portion of SMM hardware, 
they may be made SIMULTANEOUS. Example: 

(SIM (A (R 4) (S 8)) (A (S 12) (+ (R 22) (R 26)))) 


Note: The SIMULTANEOUS instruction is single-precision only. No double-precision 
version of “SIM” currently exists. 


C.1.5 SET-WAIT-BIT 

The SET-WAIT-BIT instruction forces the default “wait bit” of successive generated 
instructions to be set (1). Otherwise this instruction is identical to the NOP instruc¬ 
tion. 


27 




C.1.6 CLEAR-WAIT-BIT 


The CLEAR-WAIT-BIT instruction forces the default “wait bit” of successive gener¬ 
ated instructions to be cleared (0). Otherwise this instruction is identical to the NOP 
instruction. 


C.1.7 ALU-LOAD-MODE 

The ALU-LOAD-MODE instruction is used to effect “load mode” operations inside the 
ALU. The “load mode” functions deal with such operations as rounding mode and 
denormalized numbers. These functions are detailed in the Weitek specifications. 
The <load-mode-value> given in the ALU-LOAD-MODE instruction is sent as the load 
control field to the ALU. 


C.1.8 MUL-LOAD-MODE 

The MUL-LOAD-MODE instruction is identical to the ALU-LOAD-MODE instruction ex¬ 
cept that it deals with the multiplier instead of the ALU. 

C.1.9 JUMP 

The JUMP instruction causes execution to jump to the instruction at <next-address>. 
This instruction is functionally equivalent to a (NOP (GOTO <next-adress>)) in¬ 
struction. 


C.1.10 WAIT 

The WAIT instruction is a macro which expands into a five-instruction loop. Upon 
entering the loop, the WAIT signal is set (raised). Execution remains in the loop 
until SMM receives a DONE signal from the host, at which time the WAIT signal 
is cleared. WAIT allows easy interlocking between the host and SMM. A (WAIT) 
expands into the following five instructions (n is a counter incremented with each 
WAIT instruction): 

(set-wait-bit) 

(:*wait-ra-start* nop (none) (none) (if cc4 :*wait-n-loop*)) 

(nop (goto :*wait-n-done*)) 
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(:*wait-n-loop* nop (goto :*wait-n-start)) 
(:*wait-n-done clear-wait-bit) 


C.2 Destinations 

In most instructions, the second list element is the <dest>, or destination. The 
<dest> tells chasm what machine location is affected by the action of the instruc¬ 
tion, such as where a computed value should be stored. There are four types of 
destinations. 


C.2.1 Register Destinations 

By far the most commonly used destination is the register file. The register file 
contains 64 memory locations, or registers. Generally, values are loaded from the 
SRAM into the register file at the beginning of a program, computation is carried 
out using the register file locations, and final results ate written back to the SRAM. 

A register file destination consists of a keyword (either R, REG, or REGISTER) and an 
address. Valid addresses range between 0 and 63. Thus, the destination 

(REG 42) 


represents address 42 in the register file. 

C.2.2 SRAM Destinations 

Sending data to the Static RAM is similar to using the register files. All SRAM 
destinations have a keyword (S or SRAM) and an address between 0 and 4091. To 
send data to SRAM location 17, simply use the destination 

(SRAM 17) 
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C.2.3 The LATCH Destination 


In order to facilitate quick reuse of computed values, SMM has a feedback path 
from the output of the ALU/multiplier back to the input port. This feedback path 
is known as the LATCH, and to use it, simply use the destination: 

(LATCH) 


By its nature, the LATCH does not have state. Data values which are sent to the 
LATCH must be referenced from the LATCH exactly when they are available on 
the bus (either 12 (or 10) cycles after initiating an ALU (or multiplier) operation). 
Failure to source the LATCH properly will cause whatever data which was sent to 
the LATCH to be lost. The assembler will attempt to issue warnings whenever it 
thinks data may be lost due to incorrect timing, but this function is not currently 
fully implemented. 

C.2.4 The NONE Destination 

Sometimes it is necessary to compute a value but not save the computed result, 
such as when performing comparisons. In these cases, data on the bus should not 
be saved anywhere. To indicate that the result of an instruction is irrelevant, use 
the (NONE) destination. The (NONE) destination guarantees that no device will read 
the bus and load the results of the instruction. 


C.3 Sources 

Just as most instructions need to know where to send their results, they also need 
to know from where to get their results. The <source> of an instruction contains 
this information. Sources may be simple or complex. Simple sources are exactly like 
destinations. They may be register file locations, SRAM locations, the LATCH, or 
NONE. Complex sources represent the result of a mathematical operation on two 
simple sources. All complex sources are of the form: 

(<math-op> <a-port-source> <b-port-source>) 

For example, the expression (* (R 17) (R 42) ) stands for the product of the values 
in register locations 17 and 42. Many valid math operations exist; more may be 
added as necessary. 
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Note: If the LATCH is to be used in a complex source, it must be listed as the 
A-port source. Due to the design of SMM, the contents of the LATCH may not be 
loaded into the B-port. 


C.4 If Clauses 

The fourth and final portion of each instruction is the <if-clause>. The <if-clause> 
contains all information necessary for computing where execution should proceed 
upon completion of the instruction. If clauses come in three major types: none, 
GOTO, and IF. In addition, the keyword LCC may be added to any if clause to 
indicate that SMM should latch the condition bits. 

C.4.1 No If Clause Present 

If an instruction does not have an if clause (or the if clause is simply the direction 
(LCC), execution proceeds to the next sequential instruction. 

C.4.2 The GOTO-type If Clause 

An if clause of the form (GOTO <address>) or (LCC GOTO <address>) causes 
execution to unconditionally branch to the instruction at <address>. The speci¬ 
fied <address> may be either an absolute numeric address or the label of another 
instruction. 

C.4.3 The IF-type If Clause 

An if clause of the form (IF <condition> <address>) or (LCC IF <condition> 
<address>) denotes a conditional branch. The <condition> should be a valid con¬ 
dition code identifier, representing one of the eight available condition bits. If the 
<condition> bit is set (high), then execution branches to the instruction at the 
specified <address>. If the <condition> is clear (low), then execution proceeds to 
the next sequential instruction. Again, the specified <address> may be either an 
absolute location of an instruction label. 
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C.5 Example Instructions 

A few example instructions are show below: 


;; Assigns result to multiple destinations. Since the goto field has 
;; been omitted, the program would proceed to the next instruction. 

(da ((s saddr) (r raddr) (latch)) (oper (r raddrl) (r raddr2))) 

;; First operand can be from the latch instead of register file. The 
;; condition bits of the result are latched by using lcc. 

(da ((s saddr) (r raddr)) (oper (latch) (r raddr2)) (lcc)) 

;; Operation with one operand. The condition bit ccl is tested, and 
;; if it is equal to one, a branch occurs to the instruction marked by 
;: :foo. 


(da (s saddr) (oper (latch)) (if ccl :foo)) 

;; Condition bits of result are latched and the program goes to the 
;; instruction marked by :do. 

(da (r raddr) (oper (r raddrl) (r raddr2)) (lcc goto :do)) 

;; Equivalent to nop but allows use of branching field. The location 
;; none serves as a place holder for the assembler. No actual 
;; location is accessed. 

(da (none) (none) (goto :loop)) 


D The SMM Software Suite 

This portion of the memo describes the software tools available for the Standard Map 
Machine. Section D.l describes the primitive functions added to Scheme which allow 
communication with SMM. In Section D.2 we describe some of the simple procedures 
which exist to facilitate downloading of code to SMM and bidirectional transfer of 
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data. Section D.3 describes breifly how to run the assembler, and Section D.5 lists 
warning messages which might be issued during assembly. 


D.l New Scheme Primitives 

Three primitive operations were added to the CScheme Microcode to allow Scheme 
processes to communicate with SMM. Acually, these primitives will work with any 
HP memory-mapped device, not just SMM. These primitives are defined in the file 

Zurich:/scheme/users/chaos/microcode/chaos.c 

A brief description of each new primitive follows: 

D.1.1 (init-memory-mapped-device string ) 

The init-memory-mapped-device primitive initializes a page of memory for com¬ 
munication with a memory-mapped device and tells the Unix kernel that accesses to 
this area of memory should be directed to a specific device, init-memory-mapped-device 
takes one argument, a string containing the full name of the device file to be accessed 
(we used “/dev/chaos” for SMM). The device file must already exist and be config¬ 
ured to point to the proper slot where the card resides. If init-memory-mapped-device 
succeeds, it returns the base address of the memory block assigned to the device 
(this value is also maintained internal to Scheme; we return it here to signify suc¬ 
cess of the initialization operation). If init-memory-mapped-device fails, it returns 
#! FALSE. 

D.l.2 (write-memory-mapped-device! address data-word ) 

The write-memory-mapped-device! primitive takes as arguments an address and 
a data-word , both integer values, and “writes” the data-word at offset address in the 
memory-mapped device block of reserved memory. When the write to the reserved 
block of memory is attempted, it is converted into an operation to SMM. The address 
value is sent over the address lines to SMM, and the data value is sent over the data 
bus. As both busses are 16 bits wide, the binary representations of address and data 
cannot exceed 16 bits. 
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D.1.3 (read-memory-mapped-device address) 

The read-memory-mapped-device primitive takes an integer address as its argu¬ 
ment and returns the integer data values “stored” at offset address in the device 
memory. Again, the read operation is intercepted by the kernel and changed into a 
request to the actual device. SMM reads the address from the address bus and send 
back 16 bits of data on the data bus. 

D.2 Scheme Utilities 

Once the primitive operations described above were installed into the Scheme sys¬ 
tem, higher-level utilities dealing with SMM could be built. Before using any of 
these functions, the Scheme primitives must be initialized as follows: 


(define init-memory-mapped-device 

(make-primitive-procedure ’init-memory-mapped-device)) 
(define write-memory-mapped-device! 

(make-primitive-procedure ’write-memory-mapped-device!)) 
(define read-memory-mapped-device 

(make-primitive-procedure ’read-memory-mapped-device)) 

(init-memory-mapped-device "/dev/chaos") 


Once the primitives have been initialized, all of the procedures below may be used. 


D.2.1 (stop-clock!) 

The (stop-clock!) procedure halts the internal SMM clock. The clock should 
always be stopped before performing any SMM memory operations from Scheme. 

D.2.2 (start-board value) 

The start-board procedure sends value through the high-order (bits 80-95) instruc¬ 
tion latch to the high-order instruction register. The current values in the other 
instruction latches (bits 0-79) are sent to bits 0-79 of the instruction registers. The 
start-board procedure then starts the SMM system clock. Usually, start-board 
is called with value equal to zero. 
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D.2.3 (download-code instr-vector) 


The download-code procedure takes as an argument a vector of instruction bits 
to be downloaded sequentially to SMM starting at instruction address 0. The 
download-code procedure is generally used in conjunction with * chasm-bits*, the 
output of the assembler. 


D.2.4 (download-data data-list) 

The download-data procedure is used to send constant vales to the SMM SRAM. 
The data-list is a list of one- or two-element lists. A two-element list is interpreted 
as an (address data ) pair, with value data stored at locations address and address -\-1 
(all data values are coerced to double-precision floating point values). A one-element 
list (data) instructs download-data to store the value data in memory immediately 
after the previous store. For example, the list 

CCO 1) (4 7.5) (12)) 


will cause value 1 to be stored at SRAM locations 0 and 1, value 7.5 to be stored 
at locations 4 and 5, and values 12 to be stored at locations 6 and 7. 

D.2.5 (upload-data memory-address) 

The upload-data precedure returns the floating-point value stored at SRAM loca¬ 
tions (memory-address,memory-address+ 1) in the SMM. 


D.2.6 (wait) 

The wait procedure reads the status of the SMM “wait” bit. If the value returned by 
(wait) is even, then SMM is in a wait state and the host may access data memory. 

D.2.7 (done) 

The done procedure tells SMM that the host computer has finished accessing data 
memory and to continue processing. 
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D.3 Using Chasm 

This section details the Scheme procedures provided to access SMM assembler. 


D.3.1 (chasm-source <source>) 

The chasm-source procedure takes one argument, <source>, and runs chasm using 
<source> as the program source code. <source> should be a list of instructions, 
chasm-source displays a message as each phase of assembly begins. 

D.3.2 (chasm-file <filename>) 

The chasm-file procedure assembles a single file of SMM source code. It takes 
one argument, the name of the file containing the source code (without the “.scm” 
extension), chasm-file reads the source file (adding the “.scm” extension), runs 
chasm-source over the source code, and dumps the resulting * chasm-bits* to <filename>.asm. 
Note: as chasm-file uses the Scheme load procedure to read the source file, the 
source code should be the last expression in the file. 

D.3.3 (chasm-file-load <filename>) 

This procedure is identical to the chasm-file procedure, except that it also performs 
a (download-code *chasm-bits*) after the compilation is completed. 


D.3.4 (chasm-file-list <filename>) 

This procedure is identical to the chasm-file procedure, except it also generates a 
listing file <filename>.lst. The listing file shows for each macro-expanded instruc¬ 
tion the address, instruction label, source code, and generated bits. 


D.4 Assembler Caveats and Programming Hints 

Although chasm does a good job of converting RTL into SMM microinstructions, it 
currently fails to warn the programmer of certain types of illegal programs. SMM 
programmers should be aware of the following program constraints inposed by the 
SMM architecture: 
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D.4.1 Multiplier/ALU Restrictions 


D.4.1.1 Load Modes: The load mode for the multiplier and the ALU must be 
set at the beginning of the program. It may be changed later to change the rounding 
mode of the chips. The ALU-LOAD-MODE and MUL-LOAD-MODE macros facilitate setting 
these values. 


D.4.1.2 Latency in the ALU and Multiplier: Results of multiplier opera¬ 
tions are sent to the desired destination ten instruction cycles after the execution of 
the instruction. Results of ALU operations are written to the appropriate destina¬ 
tions twelve cycles after the execution of the instruction. References to destinations 
of multiplier/ALU operations before the 10/12 cycles have passed will use the old 
value of the location. 


D.4.1.3 Multiplier Cycle Time: The pipelining requirements of the Weitek 
multiplier place restrictions on when subsequent double-precision multiplier oper¬ 
ations may start. After a multiplier operation is started, subsequent multiplier 
operations may only be started two, six, and ten cycles after the initial operation 4 . 

D.4.1.4 ALU Cycle Time: ALU pipelining requirements also place restric¬ 
tions on the number of cycles between subsequent ALU operations. If a second 
ALU operation is started before the first ALU operation has finished 5 , the the op¬ 
erations must be separated by an even number of cycles. 

D.4.1.5 Interleaving ALU and Multiplier Operations: When interleaving 
ALU and multiplier operations, there must be at least two instruction cycles between 
a multiplier operation and the previous ALU operation. 


D.4.2 Conditional Branches 

The LCC directive causes the SMM to latch the Weitek condition code bits resulting 
from the operation performed. When using LCC with a double-precision macro (DA 
or DL), the latch occurs on the first of the two single-precision instructions. The 

4 This restriction only applies to multiplier operations started within 12 cycles of the previous 
multiplier operation 

5 That is, there are fewer than 14 cycles between the two instructions. 
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condition codes may be tested using an “if” clause twelve cycles after the LCC has 
been performed 6 . 

The if clause tests the specified condition code bit and causes execution to jump 
to the specified instruction if the condition is true. If clauses in double-precision 
macros occur during the second of the two single-precision instructions 7 . 

Currently, only the ALU comparison instruction is provided by the assembler, al¬ 
though it is simple to add functionality. The comparison function, cmp , takes two 
operands and compares their values. To test if the first operand is less than the 
second, latch the condition code bits and test if ccO is true. To test if the first 
operand is greater than the second, latch the condition code bits and test if ccl is 
true. 


D.4.3 Using the Feed-Through Latch 

The feed-through latch is used for fast serial computations by allowing the result of 
a multiplier/ALU operation to be directly fed back into the multiplier or ALU as 
the A-port input. To use the LATCH destination with the multiplier, the LATCH must 
be used as a source exactly ten cycles later. When using the LATCH destination with 
the ALU, the LATCH must be used as a source exactly twelve cycles later. 


D.4.4 Reading from Data Memory 

When reading from data memory, the entire data memory address is specified in 
the instruction. Data cannot be read from a location in memory ten cycles after a 
multiplier operation which writes to that location, and twelve cycles after an ALU 
operation which writes to that location. 


D.4.5 Writing to Data Memory 

When writing to data memory, only the low four bits of the address (the intra-page 
address) are specified in the actual instruction. The page of memory written to is 

6 That is, there must be the equivalent of at least twelve HOP instructions between the LCC and 
the if clause. 

7 Notice that since if clauses is placed in the second single-precision instruction and LCC directives 
are placed in the first single-precision instruction, only ten instruction cycles are needed between 
double-precision instruction instead of the normal twelve cycles. 
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specified by the result of the last F->IS operation. Page addresses range from 0 to 
255; write address locations for each page range from 0 to 15. 

D.5 Assembler Error Messages 

This section details the possible warning/error messages which might be generated 
by chasm. 

• Unknown Opcode: <op> inserting NOP instruction.: 

The assembler read <op> as an instruction opcode, and <op> is not one of 
the valid instruction types. The instruction in question was replaced with a 
(NOP). 

• Destination not a list: <dest> converting...: 

The destination <dest> was not in list format. All valid destinations are lists. 
The assembler wrapped the input destination in a list and is attempting to 
proceed. 

• Tried to Increment Unknown Dest: <dest> ignoring...: 

• Tried to Increment Unknown Source: <source> ignoring...: 

In the process of macro-expanding a DA or DL instruction, the assembler 
read destination <dest> (source <source>), which it was not able to parse. 
The destination (source) was not incremented for the second single-precision 
instruction. 

• Source is not a list: <source> converting...: The source <source> was 
not in list format. All valid sources are lists. The assembler wrapped the input 
source in a list and is attempting to proceed. 

• Incrementing odd REGISTER <num> in DP...”: 

• Incrementing odd SRAM <num> in DP...”: 

The source to a DA or DL instruction was an odd register or SRAM loca¬ 
tion. These instructions take the low-order word of a double-precision number, 
which by convention is in a even location. 

• Unmatched use of LATCH, no reference beyond end of code.: 

The (LATCH) was used as a destination, but the code vector ends before that 
value is available. 
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• Unmatched use of LATCH, no reference in instr <instr> number 
<future-num>: 

The (LATCH) was used as a destination, but no matching source reference was 
made 12 (for ALU) or 10 (for multiplier) cycles later. 

• Warning: last placed instr not null...: 

The last instruction in the vector, normally clobbered into a (JUMP 0) in¬ 
struction to start the machine, was not a (NOP). This should not happen, 
ever. 

• Unknown Label <label> assuming instruction 0: 

A label was referenced, but no instruction was tagged with it in the source 
program. The assembler has assumed address 0 as the address of the label. 

• Source length too long: <source> ignoring extraneous stuff: 

• Dest length too long: <dest> ingoring extraneous stuff: 

An unusually long (more than two element) source (destination) was read. 
Everything after the first two elements was ignored. 

• Illegal Port: <port> assuming A-PORT: 

An attempt was made to reference the LATCH from a port other than the 
A-PORT. The LATCH is only connected to the A-PORT. 

• Unknown Source: <source> assuming LATCH: 

• Unknown Dest: <dest> assuming LATCH: 

A one-element <source> (or <dest>) was read, but it was neither (NONE) nor 
(LATCH). The assembler has assumed it was a (LATCH) and continued. 

• Unknown Source: <source> assuming REGISTER: 

• Unknown Dest: <dest> assuming REGISTER: 

A two-element <source> (<dest>) was read, but it was neither a register 
reference nor an SRAM access. The assembler has assumed it was a register 
reference. 

• Unknown MathOp: <mathop> assuming +: 

The assembler read a math operator which it didn’t recognize as valid. The 
unknown operator was replaced with +. 
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• Generating bits for unknown port: <port> ignoring: 

In the process of generating the actual bits far an jMtntctkie, the assembler 
was told to use port <porfc>, which it didn’t socg g ttbe as valid. 

• Unknown CC: <cc> swimming code 0 

The unknown condition code <cc> was read in an itclause. The code was 
replaced with cc0. 
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